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Gene clusters reflecting macrodomain structure respond to nucleoid perturbations^ 

Vittore F. Scolari, a fe c Bruno Bassetti, c - d Bianca Sclavi, e and Marco Cosentino Lagomarsino* a 6 c 

Abstract 

Focusing on the DNA-bridging nucleoid proteins Fis and H-NS, and integrating several independent experimen- 
tal and bioinformatic data sources, we investigate the links between chromosomal spatial organization and global 
transcriptional regulation. By means of a novel multi-scale spatial aggregation analysis, we uncover the existence 
of contiguous clusters of nucleoid-perturbation sensitive genes along the genome, whose expression is affected by 
a combination of topological DNA state and nucleoid-shaping protein occupancy. The clusters correlate well with 
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^ the macrodomain structure of the genome. The most significant of them lay symmetrically at the edges of the ter 
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macrodomain and involve all of the flagellar and chemotaxis machinery, in addition to key regulators of biofilm 

i 1 formation, suggesting that the regulation of the physical state of the chromosome by the nucleoid proteins plays 

an important role in coordinating the transcriptional response leading to the switch between a motile and a biofilm 
O lifestyle. 
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Introduction 



The success of a cell's survival under different growth conditions depends on its ability to regulate a 
coordinated transcriptional response to specific environmental changes or stresses, involving large groups 
of genes flUEl. In bacteria, transcriptional regulation of genes depends on the binding of proteins to DNA, 
but also on the physical configurations of the resulting mesoscopic protein/DNA complex, the nucleoid J3- 

Specific nucleoid-shaping transcription factors (NSTFs) have the task of modulating the nucleoid's dy- 
namic structure in response to changes in environmental conditions [6J. This can result, for example, in a 
change in the compaction of the chromosome and in a differential distribution of mechanical energy, stored 
as supercoiling. These changes in the physical properties of the DNA can affect the level of expression 
of specific genes, in parallel to the activity of specific transcription factors. NSTFs may thus change the 
expression of many genes (some of which may code for the same NSTFs) both directly and through the 
physical conformations that they induce on the genome Q. 

The current transcription network view of gene regulation represents specific transcription-factor binding 
sites upstream of promoters as a directed graph, linking each transcription factor to its target node (which 
represents the transcript and its protein product) if the transcription factor has at least one binding site with 
documented activity in the cis-regulatory region of the target [7J. With this definition, the interaction graph 
structure is given by both large-scale experiments and collections of small-scale experiments JS). This view 
considers the graph of all genome (transcriptional) interactions but completely disregards the effects on 
gene expression due to changes of the nucleoid. The role played by the nucleoid's structure in the hierarchy 
of events leading to large scale transcription patterns remains largely to be elucidated. In the near future it 
will be necessary to incorporate these into a generalized description of the organization of the genome and 
the regulatory network it encodes for. This is a challenging problem because the output of the transcription 
network is due to the sum of both local and global regulatory signals, from the biochemical properties of 
transcription factors, such as their concentration and affinity for the sites on the promoter, to the mesoscopic 
nucleoid organization and DNA conformational and topological states. 

Nucleoid organization itself remains to be fully characterized. Most of NSTFs have been identified JH, 
and in some cases their local action is well known: for example, DNA bridging by H-NS or Fis is believed 
to be important for DNA loop stabilization fl9). Such supercoiled plectonemic loops can be topologically 
isolated from the rest of the genome, modulating protein binding and transcription rates flU. On larger 
scales, meticulous recombination experiments have shown that the nucleoid as a polymer-protein complex 
is divided into six compartments, or "macrodomains" [fTOll . Four of these macrodomains are defined by pref- 
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erential interaction of DNA fragments within the same domain and by their spatial colocalization within the 
cell [TQ] - fT2l . DNA segments within a macrodomain typically intersect, while inter-macrodomain colli- 
sions appear to be restricted. The two remaining "nonstructured regions" are more fluid in nature. Other 
experiments have probed large-scale nucleoid structure by tagging of specific loci and/or by nucleoid iso- 
lation [fT3T - FT9Tl . GFP-RNAP fusions have also been used in this context in an effort to identify possible 
"transcription factories" [|5l l20ti22ll . 

The relation between nucleoid state and gene expression has been the subject of several recent studies. 
Experimentally, this question has been addressed using mainly transcriptomics and chromatin immuno- 
precipitation combined with microarrays (ChlP-chip). Thanks to elegant experiments linking the length 
distribution of observed supercoiled domains with the transcriptional response to locally induced supercoil 
relaxation ll23l . we know that gene expression can be affected by its localization within such an isolated 
topological domain Il24ll25l . Thus, the same bridging nucleoid proteins that give the nucleoid a branched 
structure when observed by electron microscopy may also be responsible for part of the the transcriptional 
regulation not accounted for by the network of transcription factors and regulated genes. Other experiments 
have characterized RNA-polymerase (RNAP) and specific NSTF binding by ChlP-chip ll26T - l28l . By mon- 
itoring generic protein binding throughout the chromosome 11291 , Vora et Al. have found extended protein 
binding regions (called "extended protein occupancy domains" or EPODs) connected to either transcrip- 
tionally silent or highly expressed clusters of genes. Finally, transcriptomics has been applied to the study 
of the effects of the knockout of specific bridging nucleoid protein and to changes in the level of negative 
supercoiling ll3~0H3~2ll. 

Computationally, attempts have been made to characterize the binding specificity of NSTFs ll33~l[3"4ll and 
to interrogate the one-dimensional arrangement of genes of different categories for signals possibly related 
to nucleoid structure 051 136*1 . In particular, a thread of studies Il3~71 - l4"0l on possible "periodicity" patterns 
has found a number of interesting spatial regularities in the arrangement of genes belonging to different 
categories. For example, the correlation of gene expression with gene codon bias, has recently been related 
to large contiguous "sectors" along the chromosome |[37l . 

Here, we present an integrated analysis combining different independent data sources that report on (i) 
specific DNA-protein interactions, (ii) different levels of gene expression, and (iii) large scale nucleoid 
structure, in order to uncover coherent, consistent correlations between these different levels of genome 
organization. We focus on the spatial distribution along the genome of these data sets. Our statistical 
aggregation analysis shows that part of the macrodomain structure of the genome emerges directly from the 
analysis of the distribution of genes that change their expression when comparing wild-type versus nucleoid- 
perturbed conditions PT1 . In addition, by the analysis of specific nucleoid protein occupancy profiles and 
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EPODS, we recover a similar structure, where the most significant regions flank the Ter macrodomain of 
the nucleoid defined by Valens and coworkers ifTOl . In the same data, we also recover a periodicity signal 
in analogy with previous studies on other genomic data sources related to transcription of E. coli genes 
and genome adaptation. Finally, a functional analysis of the clusters reveals significant enrichment of the 
flagellar and chemotaxis genes, together with regulators of biofilm formation. 



L RESULTS 

A. The distribution of genes sensitive to H-NS/Fis deletion and supercoiling perturbations is nonuniform 
along the genome. 

We began the analysis from the transcriptomics data of Blot et al. 11301 l3T1l . In these experiments 
the global expression profiles of wild-type E. coli K12 were compared with mutants carrying combined 
nucleoid perturbations in the form of knockouts of the NSTFs proteins Fis or H-NS, and a mutation of 
a gyrase or a topoisomerase, affecting the average supercoiling background. The differential analysis of 
these experiments gives seven sets of genes significantly responding to these nucleoid-related perturbations 



(Supplementary Figure SMI ). A simple density map of these lists shows notable peaks that correspond to 
linear regions of the chromosome characterized by a stronger transcriptional reaction to nucleoid-related 
perturbations with respect to other parts of the chromosome. 



B. Clusters of transcriptional response to H-NS/Fis deletion and supercoiling perturbations correlate with 
macro-domain and chromosomal segment organization of the genome. 

Marr and coworkers OTTl detected clusters in the same gene lists with a threshold technique linking genes 
with the proximity threshold t in the range lb < t < 10Kb. We performed a different clustering analysis 
probing multiple scales, considering also the statistical significance of the one dimensional aggregation by 
comparing the peaks of the empirical histogram with the highest peak found in the histogram of randomized 
lists. Our method can thus define relevant clusters and also associate a P- value to them for every given scale 
(see Methods and Figure [T]). 

We systematically analyzed both supercoiling sensitive genes in the WT, AFis and AH-NS intra-strain 
lists (see Methods) and the nucleoid-protein sensitive genes at fixed supercoiling condition in the WT- 
AFis(low -a), WT-AFis(high -a), WT-AH-NS(low -a), WT-AH-NS(high -a) inter-strain lists. Sup- 



plementary Figure SI shows the clusters found. A cluster at 1.90-1.96 Mb, or 41 centisomes (P = 0.01) 
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of genes sensitive to supercoiling changes was found for all three strains. Another cluster at 1.10-1.20 Mb, 
24 centisomes, (P < 1CT 3 ) sensitive to supercoiling changes appears in the wild type experiment, and a 
cluster at 3.80-3.81 Mb, 82 centisomes (P = 0.04) is present in the mutant lacking the Fis protein. The 
deletion of Fis at fixed supercoiling background causes a variation in gene expression in a cluster at 1.95- 
2.03 Mb, 43 centisomes, (P = 0.05) in both high and low negative supercoiling conditions and in a cluster 
at 3.33-3.55 Mb, 74 centisomes (P = 0.01) only at low negative supercoiling conditions. The deletion of 
H-NS at fixed supercoiling background causes the emergence of a pair of neighboring clusters respectively 
at 1.97-2.05 and 2.08-2.15 Mb (around 43 centisomes) and a cluster at 1.06-1.17 Mb (24 centisomes) at 
both supercoiling conditions (P < 10~ 3 ). Another cluster emerges in the condition of low negative super- 
coiling at 1.50-1.66 Mb (34-35 centisomes, P = 0.03), a position compatible with the replication terminus, 
and a cluster at 0.30-0.34 Mb, 7 centisomes, (P = 0.05) appears at high negative supercoiling condition. 
Notably, very similar clusters appear from the data on the genes significantly responding to Fis deletion as 
a function of the growth phase, specifically in mid exponential phase [32] (Supplementary Figure |S2|). 

A summary of the most significant clusters is shown in Figure [2] and Supplementary Table SI At the 
scale of the macrodomains, these clusters appear to overlap well with the previously observed segmented 
structure of the chromosome [fT0ll37ll . At smaller scales, clusters preferentially localize towards the edges 



of macrodomains or in nonstructured regions (Supplementary Figures [ST| to [S3]) . In particular, the highest 
significance clusters appear at all scales in H-NS-related perturbation experiments as well as in response to 
supercoiling changes in the intra-strain WT list (see Figure [2]). At small scales, these clusters concentrate 
rather well, within the uncertainty of the analysis, to the edges of the Ter macrodomain defined by Boccard 
and coworkers ifTOl [T3l . also termed fluid region by a more recent study lfl4"l . while at larger scales they 
cover the entire Ter macrodomain. 



C. Binding profiles of nucleoid-shaping proteins along the genome follow the same spatial patterns of the 
expression data. 

The previous analysis shows a tight link of the transcription program with the structuring of the nucleoid, 
and the macrodomains in particular, related to the action of the NSTFs Fis and H-NS. In order to test for the 
direct action of these proteins at these sites, we considered data concerning specific binding sites of Fis and 
H-NS on the chromosome. Binding sites of H-NS and Fis proteins along the E. coli genome in vivo were 
extracted from the ChlP-chip data of Grainger et al. Il27l . We also considered the list of FIS binding sites 
obtained by high-density ChlP-chip by Cho et al. [|28l and from the RegulonDB database [8| (see methods 
for the nomenclature of these experiments). 
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The cluster diagrams of those lists are reported in Supplementary Figure S3 The clearest clusters appear 
from the experiments of Grainger and coworkers. In particular, we found three clusters with a low P-value 
(P < 1(T 3 ) in FISbind(Grainger) of which two (at 1.12-1.14 Mb, 24 centisomes, and at 1.95-2.03 Mb, 43 
centisomes) that are precisely superimposed on the clusters found in Wt-AH-NS gene expression data sets, 
laying at the border of the Ter macrodomain, and one (at 3.43-3.46 Mb, 74 centisomes) corresponding to 
the cluster already found in WT-AFis. We also found two clusters in HNSbind(Grainger) at 1.99-2.02 Mb, 
43 centisomes (P < 1CT 3 ) and at 2.09-2.12 Mb, 45 centisomes (P = 0.01) that correspond to the cluster 
found in the transcriptomics in Wt-AH-NS set, at the border of the Ter macrodomain. Finally, we find a 
cluster at 2.98-3.00, 64.5 centisomes (P = 0.01) and another one at 3.78-3.82, 82 centisomes (P < 10" 3 ) 
that can be superimposed with the border of the Ori macrodomain. The Fis binding data of Cho et al. appear 
to give a weaker signal for this cluster, which however remains visible. This discrepancy may come from 
the different resolution of the two studies, or from a different threshold used in the two studies to define 
a binding site. Finally, the other control set from H-NS high-resolution binding data from Oshima et al. 
shows all the clusters found in the Grainger data, plus additional clusters, especially found in the Ter area. 
In this case, the differences are most likely due to the different growth conditions in this experiment. 

Considering Fis and H-NS binding sites from the RegulonDB dataset, we found multiple clusters of 
difficult interpretation. This is possibly due to the knowledge bias of those data and the elusive nature of 
NSTF binding motifs [|33l . Sonnenschein and coworkers 11361 found that genes that, according to current 
knowledge as compiled by RegulonDB, do not participate in the transcriptional regulatory network, show a 
statistical repulsion for linear genome positioning with genes that do, and they interpret this as a signature 
of nucleoid-related gene regulation. Motivated by this finding, we also performed a comparative cluster 
analysis on the genes included in the RegulonDB regulatory network and outside (Supplementary Figure 
|S4] ). While the knowledge-biased RegulonDB network presents many clusters of difficult interpretation, 
we surprisingly found that the genes outside the RegulonDB network show clear clusters on the border of 
the Ter macrodomain and a larger scale cluster covering the Ter macrodomain (P < 10~ 3 ). These regions 
might be enriched of genes expressed under physical control by the nucleoid rather than by specific network 
interactions. Alternatively, the unknown function of some of these operons results in a lack of information 
regarding their regulatory mechanisms. 

In order to gain more insight into the binding of nucleoid related proteins on the DNA, we also consid- 
ered the extended protein occupancy domains (EPOD) experimental data produced by Vora et al Il29l . This 
technology reveals protein occupancy across an entire bacterial chromosome at the resolution of individual 
binding sites. EPODs are long contiguous segments of DNA-bound proteins along the chromosome, and 
were found to correspond to transcriptionally silent region (tsEPODs) or highly expressed ones (heEPODs). 



We compared the density profiles of heEPOD and tsEPOD to the gene density of the other data sets. The 
local contributions to the Pearson correlation coefficient of some representative data sets are plotted in Sup- 



plementary Figure S6 The figure shows a high correlation between heEPOD density, FISbind(Grainger) 
and WT-AH-NS(low —a), specifically in the area around the border of the Ter macrodomain (around 2Mb, 
or 43 centisomes). The similarity of the density profile is evident from the visual inspection of the normal- 



ized densities. As an example, Figure S5 reports the case of heEPOD and WT-AH-NS(low —a). Finally, 



we considered the enrichment of lists of genes within EPODs with respect to genes responding to nucleoid 



perturbations (Supplementary table S2 and S3 ). heEPODs have a significant intersection with all the inter- 
strain experiments as well as with the intra-strain supercoiling changes in absence of H-NS. On the other 
hand, tsEPODs significantly overlap only with the perturbation experiments related to H-NS (inter- strain), 
as expected from its known role as a transcriptional repressor PTll42ll . 

D. Functional classes of genes in the clusters. 

Hypergeometric testing of enrichment for MultiFun ll43l functional categories was carried out system- 
atically to all the considered datasets and also to the genes contained in the most significant clusters found 
in the spatial aggregation analysis. The result of the enrichment analysis are available on the web site 

http : / /www . lgm. upmc .fr/scolarietal/| 



The flagella and chemotaxis classes, in addition to several bio film related classes, are enriched in both 
of the clusters at the edges of the Ter macrodomain (Cluster 1 and Cluster 4). These classes are also 
enriched in the following datasets: FISbind (Grainger), HNSbind (Grainger) AFis intra-strain, WT-AFis 
and WT-AH-NS at both supercoiling conditions, WT-AH-NS (ME), and WT-AFis (150 and 240min) in the 
Bradley data sets. In order to compare these clusters the flagella synthesis network, we carried out a spatial 
aggregation analysis of the operons directly controlled by the FlhC transcription factor, the master regulator 
of flagella gene expression. The cluster diagram is reported in Supplementary Figure [S7] and shows a clear 
correlation with the clusters identified by the analysis of the datasets from both Fis and H-NS binding and 
transcriptomics experiments. 

The role of Fis and H-NS in the regulation of flagellar gene expression has been known for some time 
p4]447ll . Interestingly, flagella and chemotaxis genes share the same clusters with functional classes related 
to biofilm formation, such as the operons responsible for curli and capsule synthesis and the M and O anti- 
gens, in addition to phospholipid synthesis. The genes in each experimental list can be in turn divided into 
two sets, depending on whether their level of expression increases or decreases upon a given perturbation. 



These sets are shown in Supplementary Table S5 The genes for motility and those for biofilm formation are 
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found for the most part on opposite sets. As a consequence, a given perturbation will affect the properties 
of most of the genes within this cluster by a change of gene expression in opposite directions for genes in 
the the two functional categories. 



E. Statistically significant periodicities emerge in the arrangement of nucleoid-perturbation sensitive 
genes along the genome. 

While the analysis described above identified specific large regions of high density of affected genes, 
previous studies have identified specific regular structures of the nucleoid that suggest a more global regu- 
latory influence of nucleoid organization ||37] - l3~9l |48l . We thus wanted to test the possibility that the genes 
that are sensitive to supercoiling variation and to deletion of Fis and H-NS may be organized in regularly 
spaced groups on the chromosome. We built the histogram of the position of the genes and the histogram 
of the distance between each pair of genes in the empirical lists of the chromosome. 

The height of the spectral peaks for every periodicity in the empirical distributions was compared to the 
distribution of global maxima of a random null model shuffling the gene lists in order to discern statistically 



significant periodicities (see Methods and Supplementary Figures p8]and S9) 



Supplementary Table S4 contains a synthetic summary of the periodicities found. A significant period- 
icity was found in the position distribution of the intra-strain WT list with period length of 352 Kb and a 
P-value < 0.04; a similar periodicity was found also in the distance distribution with a period of 328 Kb 
(P < 0.01). Another significant periodicity emerges in the AFis data set, with period length of 101 Kb 
(P < 0.01); this periodicity was also confirmed by a similar signal in the distance distribution at about 98 
Kb. The AH-NS set shows a periodicity (P < 0.05) appearing in the distance distribution at 20 Kb (which 
is below the resolution of the density histogram). In the WT-AH-NS(low — a) two periodicities emerge at 
385 and 660 Kb (P < 0.01). Both of them were confirmed in the distance distribution with a period of 331 
and 589 Kb (P < 0.01). A highly significant peak also emerges at the very large scale of 2.3 Mb, which is 
a sign of a genome wide asymmetry that confirms the results of the cluster analysis. A periodicity of 100 
Kb also exists only in the distance distribution (P < 0.05). 

In the WT-AH-NS(high -a) list, two periodicities were found at 385 and 675 Kb (P < 0.01). Both 
of them were confirmed in the distance distribution with periodicity of 370 and 660 Kb (P < 0.01). Two 
periodicities at 22Kb and 100Kb were also found only in the distance distribution (P < 0.05), while on 
the position distribution there is a non significant local maximum at 100Kb. No significant periodicity was 
detected in either the WT-AFis(low —a) and WT-AFis(high —a) empirical lists. It is worthwhile noticing 
that the periods detected in the distance distribution are systematically smaller than periods detected in the 
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density. However, the discrepancy is relatively small (< 10%) and always smaller than the bin-size of the 
density histogram. 

In synthesis, we found the following significant periodicities, reported in supplementary table|S4} one of 
20 ± 36Kb, for AH-NS and WT-AH-NS (high -a), that could be comparable to the length of supercoiling 
domains ll23ll . and one at 100 ± 36Kb, the same length of the solenoidal signal found by Wright and 
Kepes ||39~1 . In addition, a periodicity around 360 ± 36 Kb, which might be related to macrodomain size, 
appears under perturbation by both H-NS deletion and changes in supercoiling. The compatibility threshold 
of 36 Kb was set to twice the bin-size of the density distribution. 

Discussion 

Efficient detection of linear aggregation. The large amount of high-throughput data being generated 
regarding genome organization and transcription requires the development of efficient approaches that are 
able to integrate different data sources. We developed a novel strategy to quantify the linear aggregation 
along the genome of different gene lists. This technique has the two main advantages of considering linear 
aggregation at all scales, and of being able to assign statistical confidence to the presence of gene clusters at 
different scales by comparing empirical data with suitable null models. We applied this method to multiple 
sets of experiments related to nucleoid protein binding and to the global transcriptional response to nucleoid 
perturbation, with a focus on the two nucleoid- shaping proteins Fis and H-NS. 

Clusters of contiguous genes responding transcriptionally to nucleoid perturbations appear to follow 
the macrodomain structure of the genome. The results of the analysis confirm that the transcriptional 
response to nucleoid perturbations in the form of Fis/H-NS deletion and changes in the average level of 
supercoiling is highly non uniform along the genome [32, and reflected the results obtained from the 
analysis of binding profiles (see below). All the data analyzed, coming from independent sources ll30T - [32ll . 
show multiple significant clusters whose arrangement is highly correlated with the probed spatial structure 
of the chromosome, and specifically with macrodomains llT0llT2l[T4ll25ll49l . Macrodomains have a well- 
defined spatial arrangement and localization in the cell both during chromosome segregation and during 
interphase, and preserve the linear order of genes along the genome [fTTl[T3l[T6ll49] - |52l . At larger scales, 
generally the clusters appear to overlap well with macrodomains, while at smaller scales they preferentially 
localize towards the edges of macrodomains or in nonstructured regions. This is particularly evident for 
the data producing clusters in the Ter region. These clusters also superimpose well with the segments of 
coherence between gene expression and codon bias 071 . Thus, we conclude that this evidence supports a 
tight link between large-scale transcription programs of the cell and the spatial organization of the genome 
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as a nucleoprotein-polymer complex ||53l . 

Specifically, Fis and H-NS are known to be important protein factors for the shaping of the nucleoid [|3l 
16). Fis is believed to create and isolate supercoiling domains by bridging two distant DNA regions, and 
is generally associated with positive transcriptional control. H-NS forms stable oligomers that bridge two 
DNA helices, stabilizing a plectoneme, and thus possibly inhibiting transcription B9l|54). It is worthwhile 
noting that in the nucleoid perturbation experiments considered here, the clusters emerging from changing 
the supercoiling background and those that emerge under H-NS deletion are very similar, which suggests 
that the effects of these two different perturbations on the nucleoid might be related. A similar clustering 
behavior has recently been reported for the transcriptional response to deletions of the nucleoid protein 
Hu lf55) . In this study it was found that the frequency of genes influenced by the absence of Hu correlates 
with the macrodomain organization of the nucleoid. The genes upregulated in the absence of Hu are found 
for the most part in the Ori macrodomain extending to the nearby nonstructured regions, while the genes 
downregulated in these conditions are found with higher frequency in the Ter and Right macrodomains. In 
addition the pattern of transcription upregulation in the Hu mutant mirrors the density of gyrase sites along 
the chromosome, pointing to a specific role of Hu in maintaining the supercoiling homeostasis in the rRNA- 
rich Ori macrodomain. On the other hand, the genes in the Ter macrodomain appear to be more sensitive 
to global changes in supercoiling and to regulation by Fis and H-NS in mid exponential phase (when rRNA 
expression is excluded). However, the Hu regulon does not show significant overlap with the Fis and H-NS 
regulons and with the genes influenced by supercoiling ||30) . only a small subset of genes are found to be 
co-regulated |[56) . 

Clusters of nucleoid-shaping protein binding spatially correlate with nucleoid perturbations. The 
same clustering analysis applied to the binding of Fis and H-NS from ChlP-chip data 11271 l28l [57) reveals 
clusters that correspond very well with those emerging from the transcriptomics data sets. We can also 
report that our preliminary survey of very recent ChlP-seq data for the same proteins also agrees with these 
findings ||58) . This confirms that the two proteins also have a direct role in physically shaping the region 
of the nucleoid that responds to their action. More specifically, deletion of H-NS influences the expression 
of the genes in the cluster at 1.1Mb (24 centisomes), which overlaps with the cluster of Fis binding from 
the ChlP-Chip dataset, pointing to a tight link between the activity of these two proteins. Moreover the 
cluster at 2Mb (43 centisomes) is also enriched for heEPODS, high expression extended protein occupancy 
domains Il29l 

It is important to note that while the two transcriptomics datasets and the EPOD results were obtained 
from cells growing in rich media (LB or YT), the ChlP-chip data from the other two experiments was 
obtained from cells growing in minimal media (M9 plus glucose or fructose), with the exception of the 
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Oshima data, wich refer to mid-exponential growth phase in LB medium. Most data relate to cells in mid 
exponential phase, which suggests that the coherent results between binding patterns and transcriptional re- 
sponse to perturbation might correspond to growth-phase specific features. In addition, some variation may 
also arise from the loose definition of "mid exponential phase", determining when the cells are harvested 
after dilution in fresh medium. The agreement of these different datasets indicates that the positioning of 
the clusters is robust within this range of experimental conditions. The dataset that addresses directly the 
changes in gene expression as a function of growth phase, shows that the clusters reflect changes in cellular 
metabolism (see Supplementary Figure |S2|). In the future, we expect that further studies will directly probe 
the role of nucleoid structure and organization in response to changes of both growth rate and growth phase. 

Coherent periodicity signals emerge from both nucleoid protein binding and transcriptional response 
to nucleoid perturbations . We also found significant evidence for periodic arrangement of the nucleoid 
perturbation sensitive genes. Some of these periodicities correspond to characteristic lengths that can be 
associated to supercoil domains, the ~20Kb branched structure of genomic plectonemes [23 1 or the previ- 
ously observed 100Kb periodicity of evolutionarily conserved gene sets [38J. Notably, as in the clustering 
analysis, the periodicities emerging from supercoil perturbation also correspond to those related to H-NS 
deletion. Evidence for spatial organization of genes along the chromosome has already been presented 
for E. coli K12 ll38l l39l l48l . and models that could explain it have been formulated in the form of the 
so-called "rosette model" 11591 , and the "solenoid model" [[391 |40l . In the case of our data the question 
remains open regarding whether the periodic signal is simply related to the presence of clusters. This is 
technically difficult to test, as it would require a null model that randomizes a list by keeping its linear 
aggregation properties constant. It is possible that newly developed techniques are effective in bypassing 
this problem |[60l . 

The two main clusters at the edges of the Ter macrodomain include the whole flagella regulon and key 
regulators of biofilm formation. We will now focus on the possible functional aspects of the clusters. 
This analysis has identified two clusters of genes on either side of the Ter macrodomain (Cluster 1 and 4) 
whose expression is affected by deletion of either Fis or H-NS and upon changes in negative superced- 
ing ll30T - [32l . in addition these clusters superimpose with those obtained from the analysis of Fis and H-NS 
binding obtained by ChlP-chip [|27ll6TI . The list of genes in these two clusters include all the operons for 
flagellar proteins and several genes required for cell adhesion and the formation of biofilms. The expres- 
sion of flagella is induced when the bacteria need to swim either away from a stress or towards a richer 
nutrient environment. Swimming is also necessary for the first steps of biofilms formation leading to re- 
versible attachment. However, flagella synthesis is soon shut down as the biofilm structure begins to form 
(reviewed in Il62ll63l0 . A similar exclusive gene expression program takes place when the cells transit from 
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exponential growth into stationary phase. For example, below 30 °C the genes needed for the synthesis of 
curli fimbriae are induced, while expression of the flagella is repressed via a regulatory network where the 
second messenger c-di-GMP is important [64]. 

Expression of flagella takes place in a sequential pattern corresponding to the order of assembly of 
the different protein components Il65ll66l . The flagellar expression cascade is controlled by a master reg- 
ulator FlhDC, and FliA, the flagella-specific sigma factor. This regulatory network also includes post- 
transcriptional and post-translational regulation and feedback control Il67ll68ll . 

Symmetric clustered organization of the flagella regulon. The organization of the flagellar regulon in 
the two nucleoid-related clusters identified here could suggest a differential regulation of these two sets of 
genes. However, the sequential order of expression is not reflected in their linear organization along the 



chromosome (Supplementary Table S5). On the other hand, their position on opposite sides of the two 
replicores suggests that it may bring an advantage to replicate these two clusters roughly simultaneously 
instead of sequentially, in order to maintain the relative proportions of the flagellar proteins. 

Moreover, the two sides of the regulon would attain on average equal accumulated supercoiling from 
replication rounds, which would make (in absence of stable topological barriers) symmetrically placed 
genes on different replichores sense similar physical cues. This may confer a specific sensitivity of these 
regions to changes in supercoiling and nucleoid protein abundance that play a role in differential expression 
under different kinds of stresses. 

Another interesting question is whether this symmetry is common across bacterial species. Certainly this 
is true for close species such as Salmonella. On the other hand, there are indications that this symmetric 
arrangement might be a general principle. Studies of bacterial comparative genomics [38J show a tendency 
of cofunctional genes to be placed symmetrically with respect to replication origins. Other studies ll69l 
ITOl uncovered evidence for preferred symmetric chromosomal inversions around the replication origin in 
evolving bacteria, which would preserve a symmetric arrangement of genes. 

Are the flagella regulon clusters part of a hyper structure? At the same time, it is plausible that the 
two clusters are in contact or found near each other in the cytoplasm due to the compaction of the Ter 
macrodomain [1711 . One can speculate that these clusters are part of a larger structure that is co-localized in 
three-dimensional space, and that both spatial aspects and physico-chemical ones contribute to its function. 
Following the example of the eukaryotic field, these spatial relationships could emerge from both compu- 
tational [1721 and experimental studies 11731 . The preference of a possible structure that colocalizes genes to 
be mirror- symmetric can be argued by the fact that it would be disrupted only once per replication round by 
advancing replication forks. 
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Direct and indirect nucleoid-based regulation of the flagella-regulon clusters. As described above, our 
analysis has identified these clusters because of the high density in genes whose expression is affected by 
specific nucleoid perturbations and/or high density of nucleoid protein binding sites. Supercoiling, Fis and 
H-NS are known regulators of flagellar expression in several organisms Il44l - l47ll . It is useful to discuss the 
effects that these parameters may have on the expression of the global regulators or directly on the different 
genes found downstream in the regulatory network. 

H-NS can influence the expression of the FlhDC transcription factor both directly and indirectly through 
the regulation of expression of the HdfR and RcsAB transcription factors f[46l 1751 1821 resulting in a feed 
forward loop mode of regulation (Figure [3]). RcsAB also represses the curli fimbriae operon in Cluster 4 
(csgDEFG, csgBAC) and positively regulates the expression of the colanic acid operon in Cluster 3 (20 
genes). Colanic acid is an important component of the capsule necessary for mature bio film formation [|63l . 
The gene for RcsA is adjacent to one of the flagellar regulon in Cluster 2 and is induced in the strain lacking 



H-NS (Supplementary Table S6), while the gene for RcsB, which is also involved in the regulation of acid 
stress response, is found in the Left macrodomain and is independently regulated. As possible evidence for 
direct regulation by H-NS, a cluster of H-NS binding is observed in the ChlP-chip data for Cluster 1 but 



not for Cluster 4 (Table SI), nevertheless some predicted H-NS sites are also found near or at three of the 



operons in Cluster 4: csgDEFG, flgMN and flgBCDEFGHIJ (Supplementary Table[S6]) [33 1 



This direct regulation by H-NS can in turn be differentiated into two nonexclusive mechanisms of tran- 
scription regulation: H-NS can directly influence the binding of RNA polymerase but it can also result in a 
change in DNA topology by the formation of oligomeric structures [|54l . For example, the flagellar genes 
are in general activated by the presence of H-NS independently of the supercoiling state of the DNA, in 
part probably because of the induction of the fliA gene, however the ones in Cluster 4, including the FlgM 



anti-sigma factor, are also induced by a change in topology (Supplementary Table S6). On the other hand, 
the flhD gene is found on all the hyp lists, confirming that it is activated by an increase in negative super- 
coiling, independently of the presence of Fis or H-NS, in accordance with experimental in vitro and in vivo 
observations ll83~l . 

As already mentioned for the regulation by RcsAB, in addition to the flagellar operons, Cluster 1 and 
4 also contain several other genes involved in biofilm formation and maturation that are influenced by the 



different perturbations to the nucleoid, as shown in Supplementary Table S6 These include the genes 
coding for the O-antigen, the M-antigen, colanic acid, capsule formation, curli synthesis, antigen 43 and 
several transcription factors that can affect expression of genes outside of these clusters (Figure [3]), such 
as RcsA, CsgD, SdiA and UvrY ll63l . The change in the local DNA structure and topology affects these 
two classes of genes in opposite ways thus contributing to the transition from a motility to an adherence 

13 



phenotype (Supplementary Table |S5|). 

While the role of H-NS in the regulation of these pathways is well known, the role of Fis is still less well- 
defined. A large set of flagellar genes is activated by Fis both under high and low supercoiling conditions, 
mostly those found in Cluster 1, while the expression of csgA is repressed by Fis at low supercoiling 
as reported by ll45l 17711 (Supplementary Table S6). In addition, Fis seems to mediate the supercoiling 
dependence of flhC and rcsA expression, the first being in the AFis hyp and the second in the AFis rel list. 
These two proteins play opposite roles on flagellar synthesis pathway (Figure [3]). 

Finally, Cluster 6 in the Ori macrodomain corresponds to the chromosomal waa region containing the 
operons for lipopolysaccaride (LPS) synthesis necessary for biofilms formation ll63l . The genes found in 
this cluster respond to changes in supercoiling in the absence of Fis, overlapping with a cluster of H-NS 
binding sites as shown by ChlP-chip analysis, consistent with the presence of several predicted H-NS sites 
according to Lang et al 11331 . The chromosomal map of the known genes involved in synthesis and regulation 
of flagella, chemotaxis and the different stages of biofilm formation reveals a preferential localization in the 
Ori and Ter macrodomains (Figure 3). 

To conclude, we believe that this study shows the power of the integrated analysis of distinct datasets in 
the context of the role played by the nucleoid in transcription. In the future, in order to more easily integrate 
datasets from different sources, it will be necessary to start a common effort towards the construction of 
larger comprehensive databases and consortia collecting, sharing and analyzing data obtained with different 
high- and low- throughput techniques. 
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Figure 1 Procedure followed to detect one-dimensional aggregation of genes in the lists. Top panel: A 
sliding-window density histogram associates to every coordinate on the circular genome the number of genes in the 
empirical list in an interval surrounding the point and spanning a fixed bin-size. As an example, the density 
histogram of the WT-AH-NS(low —a) list at bin-size b s = L/256 is shown; the density of each position is compared 
with the P-value thresholds from the null model (Methods) in order to obtain the significant positions, which are in 
turn merged with a compatibility threshold of size b s in order to define the clusters. Bottom panel: example of cluster 
diagram for the WT-AH-NS(low —a) transcriptomics data, and the FISbind( Grainger) protein binding data. The 
y— axis is the genome coordinate. The plot shows the clusters discovered at different scales as a function of the 
number of bins L/b s (x— axis). The box indicates the position of the peak while the whiskers span the maximal 
extension of the clusters. Both cluster diagrams give clusters that localize close to the edges of the Ter macrodomain 
at small scales (L/1024 ~ 5Kb to L/1Q ~ 0.3Mb), and cover the Ter or the Ter+Left region at larger scales 
(L/4 ~ 1.2Mb or more). The binding data have a cluster in the nonstructured Left region appearing only at small 
scales (L/16 or less). 
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Figure 2 Graphical representation of the most significant clusters found for nucleoid perturbation transcriptomics 
data (left) and for data of direct binding of Fis and H-NS (right). The clusters found are represented by colored 
wedges with transparency increasing with size. The clusters found at large observation scales (L/8 ~ 0.6Mb or 
more) are shown separately in the lower panels. In the drawings, the outer colored circle represents macrodomains 
while inner colored circle contains the chromosome sectors defined by Mathelier and Carbone |[37l . The numbering 



of the small clusters is described in Supplementary Table SI Note the compatibility of the clusters found with the 
edges of the Ter and Ori macrodomains at small scales and with different segments of the Ter region at larger scales. 
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Figure 3 (top panel) The main genes for flagella, chemotaxis and biofilm formation are partitioned between the Ori 
and the Ter macrodomains and particularly within Cluster 1 and 4 obtained from the analysis (Figure [2]). The blue 
dots correspond to the flagellar and chemotaxis genes. The red dots correspond to the genes related to biofilm 
formation (increased color intensity results from presence of closely spaced genes). The unlabeled red dots 



correspond to various cryptic fimbriae and adhesins operons, according to the lists in Supplementary Table S6 The 
gene names in bold correspond to transcription factors, and the blue lines to interactions from these transcription 
factors to their targets (RegulonDB). (bottom panel) Direct and indirect regulation of flagellar and curli expression, 
see refs. R6l l63l 174118011 . For additional levels of regulation mediated by c-di-GMP see 1164117411811 



17 



II. METHODS 



A. Microarray data of Fis, H-NS, and supercoil sensitive genes. 



We used microarray data from Blot et al. [13011 . where the wild type level of gene expression is mea- 
sured (from cells growing in rich media) relative to genetically engineered E. coli LZ41 and LZ54 strains 
containing drug-resistant topoisomerase gene alleles to inhibit DNA gyrase or topoisomerase IV activity 
selectively [|84ll and thereby inducing negative supercoil relaxation (— a < 0.033 on average, as measured 
on plasmids) or increase {—a > 0.08 on average, as measured on plasmids). In addition, these strains 
were crossed with the knockouts AFis and AH-NS in order to determine the coupling of the effect of 
a specific nucleoid protein with a specific level of supercoiling. These experiments define seven sets of 
nucleoid-perturbation sensitive genes relative to pairs of conditions compared. The so-called "intra- strain" 
sets include the genes that change significantly their expression when the negative superhelical density a 
varies in a fixed genetic background. The "inter- strain" sets include the genes that change significantly their 
expression at a fixed average supercoiling background (-a < 0.033 or -a > 0.08) comparing the transcript 
profiles of the wild-type and knockout mutant. 

In the text we refer to intra-strain lists by the name of the relative mutant, WT, AFis and AH-NS, and to 
inter-strain lists by the names of the two mutants compared, with a suffix indicating the supercoiling condi- 
tion, WT-AFis (high or low —a), WT- AH-NS (high or low —a). Supplementary Figure [SMI] summarizes 
both the experimental sets and the gene lists. 

We also considered microarray data on a AFis strain in different growth phases (early, mid-, late- 
exponential and stationary, in rich media) from both Bradley et al. Il32ll and Blot et al. 11301 datasets. The 
lists of genes significantly changing their expression with respect to wild type are referred to by the names 
of the two mutants compared with a suffix indicating the growth phase, WT-AFis (early /mid/late/stat phase) 
for Bradley data and WT- AFis/H-NS (LS or late stationary phase/ME or mid-exponential phase/TS or tran- 
sition phase) for data from Blot et al. . 



B. Transcription network. 

The transcription network interactions were compiled from the RegulonDB 6.0 database 10, which con- 
tains a concise representation of the information available from the literature about transcriptional regulation 
of all genes in E. coli. Interactions inferred purely from microarrays were filtered out in order to decrease 
the contribution of indirect effects. Of 4552 genes in E. coli, 1524 genes are in the unfiltered network in 
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RegulonDB, 1372 are in the filtered network and of these 166 and 140 respectively are transcription factors, 
out of the 286 genes functionally annotated as transcription factors in E. coli by Gene Ontology [85, 86J. 



C. ChlP-chip binding profiles and protein occupancy data. 



Binding sites from ChlP-chip data of H-NS and Fis proteins along the E. coli genome in vivo were 
obtained from Grainger et al. [1271 . These sets are referred to as FISbind(Grainger), HNSbind(Grainger) 
in the text. As a comparison, we also considered the list of Fis binding sites obtained by high-density 
ChlP-chip by Cho et al. [|28l . identifying 894 Fis-associated binding regions (compared to the 224 regions 
found by Grainger et al.), referred to as FlSbind(Cho), and the list of H-NS binding data from Oshima et 
al. [1571 , HNSbind(Oshima). We also considered data from RegulonDB for Fis and H-NS binding sites, 
referred to as FISbind(RegulonDB) and HNSbind(RegulonDB) in the text. In order to quickly compare 
the overlap between RegulonDB target genes and Fis and H-NS ChlP-chip data sets we have carried out a 



hypergeometric test with results reported in Supplementary Table SMI Both ChlP-chip data sets refer to 
cells grown in minimal media. 

Finally, the data from Vora et al. (ref. 11291 ) was considered. In this study the amount of total protein- 
DNA interactions at a specific locus in vivo was measured (from cells grown in rich media) by a modified 
large-scale ChIP assay measuring generic protein occupancy along the genome (termed in vivo protein 
occupancy display, IPOD). Specifically, we examined the (> 1 Kb) protein occupancy domains (EPODs), 
divided by the authors into two populations by their median expression level (121 domains in the highly 
expressed class, heEPODs, and 151 in the transcriptionally silent class, tsEPODs). 



D. Statistical analysis of spatial clusters. 

We developed a statistical method for identifying clusters of genes in the lists along the genomic co- 
ordinate. This method considers the density of genes at different scales on the genome, and compares 
empirical data with results from random null models. In order to avoid spurious effects of binning, for each 
gene list a density histogram was made by using a sliding window with a given bin-size b s as exemplified 
in Figure [TJ The resulting plot of the averaged density of genes for every point of the circular chromo- 
some was considered at different observation scales of the genome, i.e. at different bin sizes of length 
b s G {L/2, L/A, . . . , L/2 n } where L is the length of the chromosome. We chose n = 10, as b s < L/1024 
is the scale of the typical gene length. 

Density peaks with a significantly high number of genes (see also Figure[T]) were identified by comparing 
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empirical data with 10000 realizations of a null model. For every bin size, the null model considers the 
density histogram from a random list of the same length of the empirical one. The number of genes for 
every bin in the empirical histogram was compared to the distribution of global maxima of the null model, 
obtaining a P- value for the value of the empirical histogram for each bin. This procedure enables to extract 
a list of statistically significant (P < 0.05) bin positions. For each bin-size (or observation scale), clusters 
were defined as connected intervals containing a significantly high proportion of the genes in the list. To 
each cluster, we assigned the lowest P-value among the merged bins. 



E. Macrodomains and chromosomal sectors. 

The location of the chromosomal macrodomains were extracted from IfTOl [121 |49ll , and considered to- 
gether with the chromosomal sectors of ref. Il37l where codon bias indices positively correlate with gene 



expression. The exact coordinates used here are presented in Supplementary Table SM2 



F. Periodicity analysis. 

Periodic signals in the position of genes of a given list were derived from both the density (histogram 
of the start position of each gene) and the histogram of the shortest distances along the genome between 
any gene pair. We computed the discrete Fast Fourier transform of this function. For every frequency v, 
the spectra of the Fourier transform is proportional to the strength of the periodic signal of period X — 1/u, 
while the complex phase 9 is proportional to a shift of the periodic distribution r = 9\/2n with respect 
to the cosinusoidal periodic distribution of that period. Note that the resolution of this analysis is limited 
to a few bin sizes, so that since the distance distribution between gene pairs contains more data points, it 
allows to probe more effectively smaller length scales. Being L the number of bases in the chromosome 
(the maximal distances is L/2) we used a bin-size of L/256 bases for the position histogram and a bin-size 
of L/2048 bases for the distance histogram. 

Peaks exhibited by empirical data were scored with the same null model used for the cluster analysis, i.e. 
randomized lists of genes conserving the length of the empirical list. Given an empirical list, we generated 
500 random lists of the same length. The lower sampling compared to the 10000 random lists generated for 
the clustering is due to the fact that the pair distance distribution requires the storage and elaboration of n 2 
data points instead of n, which causes a consequent increase of computing time. The height of the spectral 
peaks for every periodicity in the empirical distributions was compared to the distribution of global maxima 
of the null model (regardless of their position in the spectrum) as exemplified in Supplementary Figure |S8~} 
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a periodicity with a P-value < 0.05 was considered to be statistically significant. 



G. Functional annotations. 



Functional annotations were downloaded from the MultiFun web site |http : //genprotec .mbl . | 
edu/|[|43l. Gene sets belonging to clusters and effective networks were probed for enrichment of functional 



annotations by hypergeometric testing. P-values lower than 10 were considered significant. 
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Supplementary methods for Scolari et al 




Supplementary Methods Figure SMI Summary of microarray data of Fis, H-NS, and supercoil sensitive genes. 
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Supplementary Methods Table SMI The table compares reported Fis (a) and H-NS (b) binding sites from 
different data sources. The table reports the number of genes in the overlap between the lists, and the P-value in 
parentheses. 
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Supplementary Methods Table SM2 (a) Start and end positions in Mb of the macrodomains defined by Boccard 
and coworkers (ref. |[T0l[T3l0 . and (b) chromosomal sectors defined by Mathelier and Carbone (ref. 071 ") 
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Supplementary results for Scolari et al 



Clusters from nucleoid perturbation / transcriptomics experiments 



(a) WT-AFis(early phase) (b) WT-AFis(mid phase) (c) WT-AFis(late phase) 
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Supplementary Figure S2 Cluster diagrams for the transcription microarray Fis deletion data from ref. ||32ll . 
Different panels refer to different growth phases (early, mid-, late-exponential and stationary, in rich media), while 
the last panel refers to the union of all the growth phases. 
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Supplementary Figure SI Cluster diagrams for the transcription microarray nucleoid perturbation data from 
ref. EUIEQ. 
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Clusters from protein binding data: 
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Supplementary Figure S3 Cluster diagrams of Fis and H-NS binding from the RegulonDB (8l database and 
Grainger, Cho and Oshima ChlP-chip experiments Il27ll28ll57l . The binding sites of the Grainger data-set for the Fis 
and H-NS experiments show the same clusters as transcriptomics data on genes responding to supercoiling changes 
after H-NS deletion in microarray data from Marr et al. |[30l[3Tl . A hypergeometric test was performed to test the 



overlaps of the Grainger and Cho ChlP-chip targets with the RegulonDB database (Table SMI I 
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Summary of clusters from transcriptomics and protein binding data: 



From transcriptomics: 
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Supplementary Table SI Summary of the most significant clusters found at all scales and their P-values. The 
coordinates (in bp) along the E. Coli genome are: Cluster 1 1929600-2195230 Cluster 2 1993030-2037780 Cluster 
3 2096110-2141990 Cluster 4 1094210-1163310 Cluster 5 3428200-3447460 Cluster 6 3782180-3815590 Cluster 
7 2981340-2996630. Note: Cluster 2 and Cluster 3 are included in Cluster 1. 
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Clusters of genes inside and outside the known transcription regulatory network: 
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Supplementary Figure S4 Cluster diagram of the genes in the RegulonDB network and outside RegulonDB 
network (see ref. [36]). 
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Histograms ofEPODs and comparison with clusters: 



« 4x10 
1 2x1 0" 4 

8 o 




1x10" 



2x10° 
Position (MB) 



2x10" 



Supplementary Figure S5 Linear density of heEPODs along the genome (red line) compared to the density of 
nucleoid-perturbation sensitive genes WT-AH-NS(low —a) (black line, bin size L/32). The x-axis spans the Ter 
macrodomain. Note the highly correlated regions at the border of the Ter (at 1 • 10 6 and 2 • 10 6 bases) macrodomain 



in accordance to the results of Figure S6 
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Supplementary Figure S6 The black lines are the local contributions to the Pearson correlation coefficient along the 
genome coordinate (x-axis) between the normalized linear densities of he- and tsEPODS (ref. [29]) and the densities 
of nucleoid-perturbation sensitive genes. The densities where calculated using a sliding window of size L/32. 
(a) Correlation between tsEPOD and heEPOD density, (b) Correlation between tsEPOD density and 
WT-AH-NS(low -a), (c) Correlation between heEPOD and WT-AH-NS(low -a), (d) Correlation between 
heEPOD density and FISbinding(Grainger). (e) Correlation between heEPOD and FlSbinding(Cho). (f) Correlation 
between FISbinding(Grainger) and FlSbinding(Cho). 
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Number of genes of heEPOD in common with: 



List Experimental value Mean value P-value 
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Supplementary Table S2 Summary of the number of genes within heEPODs (strictly included) in common with the 
genes significantly responding to Fis and H-NS deletion, and changes in supercoiling lT30l . 



Number of genes of tsEPOD in common with: 



List Experimental value Mean value P-value 
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Supplementary Table S3 Summary of the number of genes within tsEPODs (strictly included) in common with the 
genes significantly responding to Fis and H-NS deletion, and changes in supercoiling ll30l . 
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Clusters of genes controlled by FlhC: 
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Supplementary Figure S7 Cluster diagram of the genes controlled by the FlhC transcription factor (data from 
RegulonDB). FlhC is a transcriptional activator that controls the operons related to assembling of the flagella. The 
main regions controlled by FlhC overlaps with the clusters at the border of the Ter macrodomain identified in both 
transcriptomics and binding sites lists. A second cluster is present in correspondence with the border between the 
Right macrodomain and the Right non structured zone. 
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Periodicity analysis: 



Sample periodic function 
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Supplementary Figure S8 Procedure followed to detect periodicities in the gene lists. Left panel: distribution of the 
distance of genes in an experimental list (sliding window of bin-size L/2048), compared to a periodic function. 
Right panel: discrete Fourier transform of the distance distribution. The peaks correspond to contributions of a 
periodic function of a given period, reported on the x-axis. The figure shows the spectra of the intra-strain WT list 
(see Methods), where the peak indicates a signal for a periodicity of 328Kb. The comparison of this signal with the 
distribution of the maximum of the spectra found in randomized lists gives a significance score for this periodicity. 
The same procedure can be applied also to the sliding-window density histogram at a given bin size. 
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Supplementary Figure S9 Periodicity analysis for the case of a density histogram. The analysis (and the example 



dataset from the intra-strain WT list) coincides with that presented in Figure S8 except that the left panel is a 
sliding-window histogram (bin-size L/256) of the genes in the list, rather than a distance distribution. 
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Supplementary Table S4 Table summarizing the significant periodicities found (P < 0.05). In the table the letter A 
indicates a significant periodicity found in the density histogram while B indicates a periodicity found in the distance 
pair distribution. Genes sensitive to supercoiling variation in the intra-strain WT list show a compatible periodicity 
of 360 ± 36Kb, this periodicity is found also in the WT-AH-NS lists at all supercoiling conditions. Upon Fis 
deletion, supercoiling sensitive genes lose the 360 ± 36 Kb periodicity, but show a new periodicity at 101 ± 36Kb, 
again found also in the WT-AH-NS lists in all supercoiling conditions. Finally, in H-NS deletion mutants, 
supercoiling sensitive genes lose the 360 ± 36 Kb periodicity but show a periodicity at 20 ± 36Kb also found in the 
inter-strain WT-AH-NS data in high negative supercoiling conditions. The compatibility condition of 36Kb was 
selected as twice the bin-size of the density distribution histogram. 
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Clusters and the flagellar/biofilm synthesis pathway: 
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ClUSter 1 1929600-2195230 
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gif 
gmd 


fliG 
fliH 


yegO 


motA 


hisC 


znuA 




fliM 


gmm 


fliJ 




yecR 


hisD 


znuC 


uvrY 


fliM 


gnd 


MK 




yedW 


hisF 






fliG 


hchA 


ML 




yeeV 


hisH 




wcaC 


fliS 


intG 


fliM 




znuA 


hisl 




wcaD 


fliZ 


mdtD 


RN 






motB 


wcaE 


hisH 


otsA 


fliO 






ogrK 




wcaF 


motB 




|il:0 






pykA 




wcaM 


tar 


rfbA 


fliR 






ruvA 




wzc 


yecR 


rfbB 


fliS 






tar 


yedV 




rfbC 


fliY 






torZ 




yee N 




rfbX 


fliZ 






wbbK 




yegJ 




rtc 


motB 






weal 




zinT 












wcaM 








shiA 


ycdK 






wzxC 
yedF 








torY 
torZ 


yedM 






yedl 








uad 








yegD 
















yegP 
yegS 
yegV 








wbbl 





wt 41 -54 


Hns41-54 


Fis 41 -54 


hyp rel 


hyp 


rel 


hyp 


rel 


cmoA fliJ 




cpsB 


cmoB 


flil 


cmoB yeeN 


cmoA 


cpsG 


cobT 


MK 


cobS 


dem 


rfc 


dem 


fliR 


|flhD | 


|flhD 




flhG 


res A 


hisC 


fliY 


wcaB 


flhD 




hisD 


gate 


wcaC 


fliC 


yeeN 


insF-5 


gatD 


wcaE 


fliD 


yeeR 


IpxM 


gatY 


wcaF 


MS 


yeeT 


rfbB 


gatZ 


wcaJ 


gatY 


yegS 


yebK 


hisC 


wcaK 


insA-5 






hisD 


wcaL 


IpxM 






hisF 


wcaM 


mt(A 






IpxM 




pas A 




yedR 


motB 




torZ 






|sdiA 


yeeN 








shiA 


v co U 


yebK 






torY 


yegS 


yscD 






forZ 


yehD 






yegW 


|uvrY 


znuB 


yedA 




yegX 






znuA 





yebK 

yedl 

yedW 
yea I 
znuA 



Color code legend 



wcaM 

yedV 

yedW 

yeeE 

yeeN 

yeoVV 

yegl 

yegJ 



yegW 

yegZ 

yodB 

zlnT 

ziu.B 



O-antigen 

M-antigen capsid display/biofilm 
Regulator of biofilm formation 
Curli 

[transcription factor] ""j 



Supplementary Table S5 Intersection between genes found in the data sets from the transcription microarray 
experiments of Blot et al ll30l and clusters of genes identified in this analysis. The colors indicate the gene ontology 
class. Genes from intra-strain experiment have been divided into rel and hyp columns corresponding to gene 
transcripts whose expression is associated with relaxation (rel) or high negative supercoiling (hyp). The labels act 
and rep indicate activation and repression in inter-strain profiles. 



12 



Flagellar gene expression cascade (by order of expression) 



predicted 



Class 


Gene 


Cluster 


Pos Regulator 


Neg Regulator 


Hns CC 


Hns site 


Function 




flhCD 


1 


Crp, Hns, s70, s54, s28 


rur, Umpri, MCSAb, IHr, LrnA 






Master transcirptional regulator 


1 


fliLMNOPQR 


2 


S70 FlhCD 












1 


fliE 


2 


S70 FlhCD 












1 


fliFGHIJK 


2 


S70 FlhCD 




Y 


Y 






1 


flgA 


4 


S70 FlhCD 








Basal body hook 




1 


flgBCDEFGHIJ 


4 


S70 FlhCD 






Y 






1 


flhBAE 


1 


S/U rlMLiU 












1 


fliA (S28) 


2 


S28, S70 FlhCD 


NsrR 


Y 


Y 


Sigma 28 


1 


fliZ 




S28, S70 FlhCD 




Y 


Y 


Inhibitor of curli 


1 


fliY 




ss 










III 


flgKL 


4 


S28, S70 FlhCD 








Hook 


III 


fliDST 


2 


S28, S70 FlhCD 




Y 




Hook 


lib 


fliC 


2 






Y 


Y 


Filament 


II III 


flgMN 


4 


S28, S70 FlhCD 




Y 


Y 


Anti-sigma 28 


lib 


motAB 




S28 












lib 
lib 
lib 


cheAW 

tar 

tap 




S28 
S28 
S28 








motive force 
chemosensor 




lib 


cheRBYZ 




S28 













Genes involved in biofilm formation (in order of chromosome position) 
MD Gene Cluster Regulates Regulated by 



predicted 

Hns CC Hns site Process 



right 



yadCKLM 






Hns, Crp 




criptic fimbriae 


sfmACDHF 






Hns, Crp 




criptic fimbriae 


ycbQRSTUVS 






Hns, Crp Y 


Y 


criptic fimbriae 


pgaABCD 


~4 




CsrA, NhaR 




PGA, adhesin 


csgD 


4 


csg operon, bcs operon 


OmpR, CpxR, Hns, IHF, RstA, Fis, ss, YdaM 


Y 


curli synthesis 


bssS 


4 








biofilm 


ycgv 






Y 


Y 


criptic adhesin 


ydaM 










curli inhibitor 


tqsA 










AI-2 transport, quorum sensing 


uvrY/uvrC 


1 


csrB 


LexA, SdiA 


Y 


biofilm 


sdiA 


2 


uvrY, fliE 






biofilm, cell division 


rcsA 


2 


wca, fliPQR et al 






osmolarity, membrane perturbations 


yedQ 


2* 


c-di-GMP 


ss 




curli regulation 


yeej 


1' 








adhesin 


flu 


1 




OxyR, Dam 




antigen 43 


rfb operon 


3 




Y 


Y 


O-antigen 


wca operon 


3 




RcsC/S/B/A 


Y 


colanic acid, M-antigen 


yegE 


1* 


c-di-GMP 


ss 




curli regulation 


yehABCD 


1 




Hns, Crp Y 




criptic fimbriae 


yfal 










adhesin 


yfcOPORSTUV 










criptic fimbriae 


yfjR 












ypja 










adhesin 


gutq 












ttda 












yraHIJK 






Hns, Crp 




criptic fimbriae 


bcsABZC, EFG 






CsgD 




cellulose 


cysE 












waaG 


6 




Hns ? Y 


Y 


LipoPolySaccaride synthesis 


tnaA 








Y 




rfah 










transcriptional antiterminator 


yihr 












cpxAR 




motAB, cheAW, mdtA(1) 






envelope stress 


yjbe 








Y 




fumb 












fimAICDFGH 






Hns ? 


Y 


fimbriae 


ylip 













Supplementary Table S6 Genes involved in flagellar expression and in biofilm formation, data from refs 
B6ll63ll74t - l80ll . Flagellar and chemotaxis genes are ordered according to the sequence of expression. Biofilm genes 
are ordered according to their position on the chromosome, the first column (MD) show the macrodomain in which 
the gene is located on the chromosome. The known factors regulating gene expression are indicated, as well as the 
targets of the transcription factors in the list, the abbreviation ss stands for sigma s. The presence of H-NS binding 
sites in the promoter region is shown in different columns whether it was determined by ChlP-chip ll27l (Hns CC) or 
by prediction from bioinformatic sequence analysis Il33l (Hns site). 
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Supplementary files 



Downloadable from web site: |http : / /www . lgm. upmc .fr/scolarietal/ 
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